Introduction and Background

I became interested in the human body’s ability to sprint over certain distances after learning that in the 100m dash, the shortest of the Olympic Track and Field events, the sprinters are already slowing down towards the end. This was intriguing to me, and I wanted to learn more.

To begin this project, I took data from https://speedendurance.com/ which has data from many events in track and field, including the splits for each runner in different events. In track and field, a “split” is a term used to define a runner’s pace over different lengths of the race; how fast they complete a certain section of the race.

100 meter dash

Plotted below are some of the fastest 100 meter dashes in Olympic history, with a wide range of years, from Carl Lewis in 1988 to Usain Bolt in 2008. The 0-10m split is not shown, as that is largely a factor of the runner’s reaction time.

We can see from the plot above that runners indeed slow down nearing the end of the race, and in some cases, like Bolt and Mo, they slow down quite a lot.

Correlation tests

The cor.test function tests for association between paired samples, using one of Pearson’s product moment correlation coefficient. A p-value less than 0.05 is typically considered to be statistically significant, meaning that the two events are strongly correlated.

cor.test(olympic100m$TOTAL, olympic100m$RT)
cor.test(olympic100m$TOTAL, olympic100m$`Start-10m`)
cor.test(olympic100m$TOTAL, olympic100m$`10-20m`)
cor.test(olympic100m$TOTAL, olympic100m$`20-30m`)
cor.test(olympic100m$TOTAL, olympic100m$`30-40m`)
cor.test(olympic100m$TOTAL, olympic100m$`40-50m`)
cor.test(olympic100m$TOTAL, olympic100m$`50-60m`)
cor.test(olympic100m$TOTAL, olympic100m$`60-70m`)
cor.test(olympic100m$TOTAL, olympic100m$`70-80m`)
cor.test(olympic100m$TOTAL, olympic100m$`80-90m`)
cor.test(olympic100m$TOTAL, olympic100m$`90-100m`)
cor.test(olympic100m$TOTAL, olympic100m$`Age`)
cor.test(olympic100m$TOTAL, olympic100m$`height(cm)`)
cor.test(olympic100m$TOTAL, olympic100m$`weight(lb)`)

After testing the correlation between the final time and the different splits, we see that the 50-60m split has the lowest p-value, or the greatest significance.

We’ll see the significance of the 50-60m split in the 100m dash more as we explore further down.

400 meter dash

Let’s look at splits from the 400 meter dash. This is a much longer event in track and field, so let’s see if we can observe anything that stands out

Plotted below are the 50m splits from the 2008 Olympic Trials Men’s 400m Finals

Wow!

This is much more bizarre than the 100m dash, we see that the runners start out very fast, then slow down, speed up again, then gradually slow down.

Running the same correlation test on the different splits of the 400m dash, we see that the 300-350m split, the penultimate 50m, has the strongest correlation to the final time.

Now that we have found the most significant split from each race, the 50-60m split from the 100m dash and the 300-350m split from the 400m dash, let’s see if we can use our naked human eyes to see if we notice anything peculiar

Speed!

Since the data we are analyzing includes distance and time, we can use math to extrapolate the runner’s speed. Let’s animate the runner’s speed for the 100m dash in real time.

100m dash animated

400m dash animated

Just from looking at the animations, we get some idea of why those two splits are so important. For the 400m dash, I’ve included an interactive plot to clearly see the significance of the 300-350m dash.

Interactive Plot (400m)

For the 100m dash, I’ve made a regression tree, which is a learning approach commonly used in statistics, data mining and machine learning. A regression tree functions as a predictive model to draw conclusions about a set of observations.

Regression Tree (100m)

As we can see in the regression tree above, the very first thing this model looks at to determine a quicker or slower final time is the 50-60m split. Interestingly, the reaction time plays a much smaller role than one would expect.